Sean Trott and Alex Liebscher

Introduction

We have several dependent variables which we would like to model as accurately as possible to determine the variables which contribute most to their outcomes. In particular, we care about the effect of the metaphor (or lack thereof) in each project. We first explore the data a little, and then begin a series of model comparisons for the dependent variables. Comparing models gives transparency to the contributions of each variable toward the target.

  1. Do projects that use metaphor generally receive better funding? How does using metaphor (and which type of metaphor family) influence campaign success, number of backers, and mean donation?

  2. Does higher metaphor productivity change the result?

  3. How can we characterize projects by the metaphors they employ?

  4. Within a metaphor family, are there canonical instantiations of a metaphor? (E.g. “fight battle”). Or even if productivity is low generally, are instantiations varied?

  5. How does metaphor vary with other interesting features, such as project description length, goal amount, project type/category, or cancer type?

Data Exploration

Projects per year

ggplot(dat) + geom_bar(aes(year), stat="count") + labs(title="Number of projects per year")

There are very few projects before 2013, and so for simplicity’s sake and to simplify the model just a tad, we remove 2012 and before.

dat = dat[dat$year >= 2013, ]

Projects per month

dat$month <- factor(dat$month)

dat %>%
  ggplot() + labs(title="Counts in Months") +
  geom_bar(aes(month)) +
  scale_x_discrete(labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))

Interestingly, there are a ton of January projects. This might have to do with scraping the site which lists projects chronologically.

Projects per day of the week

dat$day_of_week <- factor(dat$day_of_week)

dat %>%
  ggplot() + labs(title="Counts in Days of the Week") +
  geom_bar(aes(day_of_week)) +
  scale_x_discrete(labels = c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))

Fairly uniform, with slightly more projects being launched in the middle of the week (Thursday) anda bit fewer being launched on the weekend (notably Saturday).

chisq.test(table(dat$day_of_week))
## 
##  Chi-squared test for given probabilities
## 
## data:  table(dat$day_of_week)
## X-squared = 37.575, df = 6, p-value = 1.36e-06

There is significant difference between the observed and the expected number of projects launched each day of the week. Therefore import to include this variable in the models.

Projects from the US

ggplot(dat) + geom_bar(aes(x=factor(from_US)), stat="count") + labs(x="From US", title="Number of projects from US")

Most projects are US based. Reduce model complexity by restricting to the US then. Moreover, worrying about non-US projects means doing currency conversion and other strange things.

dat <- dat[dat$from_US == 1, ]

Project goals, text lengths, and cancer types

We scale/z-score the continuous variables according to how we wish to interpret the coefficients of the model. From Andrew Gelman: “Standardizing puts things on an approximately common scale …. (Standarize for) comparing coefficients for different predictors within a model”. Binary and categorical variables are left as is.

dat$goal_sc <- scale(dat$goal)

dat %>%
  ggplot() + labs(title="Goal Amount Distribution") +
  geom_density(aes(goal))

dat$duration_float_sc = scale(dat$duration_float)

dat %>%
  ggplot() + labs(title="Duration Distribution") +
  geom_density(aes(duration_float))

dat$text_length_words_sc <- scale(dat$text_length_words)

dat %>%
  ggplot() + labs(title="Text Length Distribution") +
  geom_density(aes(text_length_words))

dat$photos_sc <- scale(dat$photos)

dat %>%
  ggplot() + labs(title="Photos Distribution") +
  geom_density(aes(photos))

dat$updates_sc <- scale(dat$updates)

dat %>%
  ggplot() + labs(title="Updates Distribution") +
  geom_density(aes(updates))

dat$shares_sc <- scale(dat$shares)

dat %>%
  ggplot() + labs(title="FB Shares Distribution") +
  geom_density(aes(shares))

dat$comments_sc <- scale(dat$comments)

dat %>%
  ggplot() + labs(title="Comments Distribution") +
  geom_density(aes(comments))

dat$friends_sc <- scale(dat$friends)

dat %>%
  ggplot() + labs(title="FB Friends Distribution") +
  geom_density(aes(friends))

dat %>%
  ggplot() + labs(title="Cancer Type Counts") +
  geom_bar(aes(x=cancer_type)) +
  coord_flip()

na.omit(labs) %>% ggplot() + geom_bar(aes(x=reorder(keyword, keyword, function(x)-length(x)), fill=metaphorical)) + coord_flip() + theme_minimal()

Taking a look at differences in instantiation across metaphor family.

ggplot() + 
  geom_density(aes(dat[dat$only_journey == T, "first_instantiation"]), fill=rgb(1,0.5,0.5), alpha=0.3,) +
  geom_density(aes(dat[dat$only_battle == T, "first_instantiation"]), fill=rgb(0.5,0.5,1), alpha=0.3) +
  geom_density(aes(dat[dat$dom_battle == T, "first_instantiation"]), fill=rgb(0,0,1), alpha=0.3) +
  geom_density(aes(dat[dat$dom_journey == T, "first_instantiation"]), fill=rgb(1,0,0), alpha=0.3) +
  labs(x="Instantiation Position")

Project metaphors

We break down each project into how the metaphor families are distributed within the project text.

no_metaphor : Does the project lack metaphorical instances of keywords? any_metaphor : Does the project contain any metaphors at all? dom_journey : Is the journey metaphor family the dominant family? dom_battle : Is the battle metaphor family the dominant family? only_journey : Is the journey metaphor family the only family present? only_battle : Is the battle metaphor family the only family present? both_metaphor : Are both metaphor families present?

dat$no_metaphor = dat$battle_salience == 0.0 & dat$journey_salience == 0.0
dat$any_metaphor = as.logical(1 - dat$no_metaphor)
dat$dom_journey = dat$journey_salience > dat$battle_salience
dat$dom_battle = dat$battle_salience > dat$journey_salience
dat$only_journey = dat$journey_salience > 0 & dat$battle_salience == 0.0
dat$only_battle = dat$journey_salience == 0.0 & dat$battle_salience > 0
dat$both_metaphor = dat$battle_salience > 0.0 & dat$journey_salience > 0.0

dat$battle_prod = scale(dat$battle_prod)
dat$journey_prod = scale(dat$journey_prod)
dat$battle_salience = scale(dat$battle_salience)
dat$journey_salience = scale(dat$journey_salience)

metaphor_counts = data.frame(counts = colSums(dat[, c("no_metaphor", "any_metaphor", "dom_journey", "dom_battle", "only_journey", "only_battle", "both_metaphor")]))

ggplot() + labs(x="Metaphor Type", y="Count", title="Count in Metaphor Types") +
  geom_bar(stat="identity", aes(x=row.names(metaphor_counts), y=metaphor_counts$counts))

Good resources: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html https://ase.tufts.edu/gsc/gradresources/guidetomixedmodelsinr/mixed%20model%20guide.html

Primary Analyses

We are interested in the effect that metaphor presence has on the funding status, the number of backers, and the mean donation of a project.

DV: status, backers, mean donation

IV: Goal amount, text length words, duration, year, month, day of the week, photos, updates, FB shares, FB friends, cancer type

Many of these covariates explained in:

We perform drop1 model comparisons for each regression. The least significant (highest p-value) variable is removed in the subsequent model.

Various sources were used to determine what should be a random effect and what should be fixed, and how to model interactions and correlations between variables, including:

Status

nrow(dat)
## [1] 5223

In total, we have N=5223 IID samples to work with.

dat %>%
  ggplot() + labs(title="Goal Amount Distribution") + guides(color=guide_legend(title="Status")) +
  geom_density(aes(goal, color=fct_recode(factor(status), "Successful"="1", "Failed"="0"))) +
  theme_minimal()

dat %>%
  ggplot() + labs(title="Duration Distribution") + guides(color=guide_legend(title="Status")) +
  geom_density(aes(duration_float, color=fct_recode(factor(status), "Successful"="1", "Failed"="0"))) +
  theme_minimal()

dat %>%
  ggplot() + labs(title="Text Length Distribution") + guides(color=guide_legend(title="Status")) +
  geom_density(aes(text_length_words, color=fct_recode(factor(status), "Successful"="1", "Failed"="0"))) +
  theme_minimal()

dat %>%
  ggplot() + labs(title="Photos Distribution") + guides(color=guide_legend(title="Status")) +
  geom_density(aes(photos, color=fct_recode(factor(status), "Successful"="1", "Failed"="0"))) +
  theme_minimal()

dat %>%
  ggplot() + labs(title="Updates Distribution") + guides(color=guide_legend(title="Status")) +
  geom_density(aes(updates, color=fct_recode(factor(status), "Successful"="1", "Failed"="0"))) +
  theme_minimal()

dat %>%
  ggplot() + labs(title="FB Friends Distribution") + guides(color=guide_legend(title="Status")) +
  geom_density(aes(friends, color=fct_recode(factor(status), "Successful"="1", "Failed"="0"))) +
  theme_minimal()

dat %>%
  ggplot() + labs(title="Comments Distribution") + guides(color=guide_legend(title="Status")) +
  geom_density(aes(comments, color=fct_recode(factor(status), "Successful"="1", "Failed"="0"))) +
  theme_minimal()

dat %>%
  ggplot() + labs(title="FB Shares Distribution") + guides(color=guide_legend(title="Status")) +
  geom_density(aes(shares, color=fct_recode(factor(status), "Successful"="1", "Failed"="0"))) +
  theme_minimal()

dat %>%
  ggplot() + labs(title="Cancer Type Counts") + guides(fill=guide_legend(title="Status")) +
  geom_bar(aes(x=cancer_type, fill=fct_recode(factor(status), "Successful"="1", "Failed"="0")), position="dodge") +
  theme(axis.text.x=element_text(angle = 60, hjust=1))

Removed all random variabes except year because they didn’t help explain any variance in the data beyond what the residuals could capture. Year is a reasonable random effect as well (see Variance Components, Searle et al 2006)

Nesting and Chi2 differences: https://www.psychologie.uzh.ch/dam/jcr:ffffffff-b371-2797-0000-00000fda8f29/chisquare_diff_en.pdf

Model status

status.formula = status ~ shares_sc + friends_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + duration_float_sc + cancer_type + month + day_of_week + (1|year)

status.mod = glmer(status.formula, data = dat, family = "binomial")

drop1(status.mod, test="Chisq")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0015143
## (tol = 0.001, component 1)
## Single term deletions
## 
## Model:
## status ~ shares_sc + friends_sc + updates_sc + photos_sc + goal_sc + 
##     text_length_words_sc + duration_float_sc + cancer_type + 
##     month + day_of_week + (1 | year)
##                      Df    AIC     LRT   Pr(Chi)    
## <none>                  4037.7                      
## shares_sc             1 4110.5  74.775 < 2.2e-16 ***
## friends_sc            1 4036.8   1.131 0.2875549    
## updates_sc            1 4073.2  37.497 9.153e-10 ***
## photos_sc             1 4047.9  12.230 0.0004702 ***
## goal_sc               1 4243.9 208.169 < 2.2e-16 ***
## text_length_words_sc  1 4036.4   0.712 0.3988437    
## duration_float_sc     1 4039.9   4.245 0.0393560 *  
## cancer_type          19 4050.0  50.353 0.0001163 ***
## month                11 4024.7   8.988 0.6230107    
## day_of_week           6 4028.1   2.376 0.8820764    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
status.formula = update(status.formula,  ~ . - day_of_week)
drop1(glmer(status.formula, data = dat, family = "binomial"), test="Chisq")
## Single term deletions
## 
## Model:
## status ~ shares_sc + friends_sc + updates_sc + photos_sc + goal_sc + 
##     text_length_words_sc + duration_float_sc + cancer_type + 
##     month + (1 | year)
##                      Df    AIC     LRT   Pr(Chi)    
## <none>                  4028.1                      
## shares_sc             1 4100.4  74.295 < 2.2e-16 ***
## friends_sc            1 4027.2   1.091  0.296194    
## updates_sc            1 4063.6  37.532 8.993e-10 ***
## photos_sc             1 4038.4  12.308  0.000451 ***
## goal_sc               1 4234.3 208.186 < 2.2e-16 ***
## text_length_words_sc  1 4026.8   0.683  0.408398    
## duration_float_sc     1 4030.4   4.306  0.037980 *  
## cancer_type          19 4040.9  50.826 9.895e-05 ***
## month                11 4014.8   8.750  0.644944    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
status.formula = update(status.formula,  ~ . - month)
drop1(glmer(status.formula, data = dat, family = "binomial"), test="Chisq")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00105419
## (tol = 0.001, component 1)
## Single term deletions
## 
## Model:
## status ~ shares_sc + friends_sc + updates_sc + photos_sc + goal_sc + 
##     text_length_words_sc + duration_float_sc + cancer_type + 
##     (1 | year)
##                      Df    AIC     LRT   Pr(Chi)    
## <none>                  4014.8                      
## shares_sc             1 4087.1  74.324 < 2.2e-16 ***
## friends_sc            1 4013.8   0.990 0.3196353    
## updates_sc            1 4050.0  37.194 1.069e-09 ***
## photos_sc             1 4024.7  11.915 0.0005567 ***
## goal_sc               1 4220.5 207.649 < 2.2e-16 ***
## text_length_words_sc  1 4013.4   0.599 0.4389389    
## duration_float_sc     1 4017.9   5.116 0.0237078 *  
## cancer_type          19 4026.6  49.802 0.0001401 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
status.formula = update(status.formula,  ~ . - text_length_words_sc)
drop1(glmer(status.formula, data = dat, family = "binomial"), test="Chisq")
## Single term deletions
## 
## Model:
## status ~ shares_sc + friends_sc + updates_sc + photos_sc + goal_sc + 
##     duration_float_sc + cancer_type + (1 | year)
##                   Df    AIC     LRT   Pr(Chi)    
## <none>               4013.4                      
## shares_sc          1 4085.6  74.171 < 2.2e-16 ***
## friends_sc         1 4012.5   1.074 0.3001359    
## updates_sc         1 4048.3  36.898 1.245e-09 ***
## photos_sc          1 4024.6  13.224 0.0002764 ***
## goal_sc            1 4220.2 208.742 < 2.2e-16 ***
## duration_float_sc  1 4016.7   5.250 0.0219464 *  
## cancer_type       19 4025.3  49.848 0.0001380 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
status.formula = update(status.formula,  ~ . - friends_sc)
drop1(glmer(status.formula, data = dat, family = "binomial"), test="Chisq")
## Single term deletions
## 
## Model:
## status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
##     cancer_type + (1 | year)
##                   Df    AIC     LRT   Pr(Chi)    
## <none>               4012.5                      
## shares_sc          1 4088.1  77.652 < 2.2e-16 ***
## updates_sc         1 4047.8  37.335 9.951e-10 ***
## photos_sc          1 4023.6  13.149 0.0002876 ***
## goal_sc            1 4220.1 209.612 < 2.2e-16 ***
## duration_float_sc  1 4015.8   5.273 0.0216588 *  
## cancer_type       19 4024.4  49.904 0.0001354 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The final base model before adding metaphor variables:

status.mod = glmer(status.formula, data = dat, family = "binomial")
summary(status.mod)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc +  
##     cancer_type + (1 | year)
##    Data: dat
## 
##      AIC      BIC   logLik deviance df.resid 
##   4012.5   4183.1  -1980.2   3960.5     5197 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -53.680  -0.454  -0.363  -0.210  17.225 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  year   (Intercept) 0.02551  0.1597  
## Number of obs: 5223, groups:  year, 7
## 
## Fixed effects:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  -2.15296    0.17550 -12.268  < 2e-16 ***
## shares_sc                     0.45709    0.05899   7.749 9.26e-15 ***
## updates_sc                   -0.38865    0.07506  -5.178 2.24e-07 ***
## photos_sc                     0.18121    0.04790   3.783 0.000155 ***
## goal_sc                      -1.37697    0.12696 -10.845  < 2e-16 ***
## duration_float_sc             0.15478    0.06288   2.461 0.013836 *  
## cancer_typebone cancer        0.01507    0.22978   0.066 0.947704    
## cancer_typebrain cancer       0.32189    0.23916   1.346 0.178324    
## cancer_typebreast cancer      0.15897    0.19554   0.813 0.416215    
## cancer_typecervical cancer   -0.94586    0.61857  -1.529 0.126240    
## cancer_typecolon cancer       0.25363    0.41479   0.611 0.540890    
## cancer_typeesophageal cancer  0.24387    0.25667   0.950 0.342050    
## cancer_typegeneral            0.27955    0.18827   1.485 0.137587    
## cancer_typekidney cancer      0.26660    0.21654   1.231 0.218247    
## cancer_typeleukemia           0.29242    0.24983   1.170 0.241804    
## cancer_typeliver cancer      -0.63377    0.37860  -1.674 0.094133 .  
## cancer_typelung cancer       -0.68675    0.24722  -2.778 0.005471 ** 
## cancer_typelymphoma           0.19587    0.20173   0.971 0.331583    
## cancer_typemelanoma           0.30797    0.25457   1.210 0.226381    
## cancer_typemixed              0.07048    0.20084   0.351 0.725634    
## cancer_typeneuroblastoma      0.72325    0.32729   2.210 0.027118 *  
## cancer_typepancreatic cancer -0.49943    0.76353  -0.654 0.513050    
## cancer_typeprostate cancer   -0.02417    0.56212  -0.043 0.965704    
## cancer_typeskin cancer       -0.02950    0.25328  -0.116 0.907279    
## cancer_typetesticular cancer  0.81721    0.31682   2.579 0.009896 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 25 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

Add metaphors

See https://stats.stackexchange.com/questions/77313/why-cant-i-match-glmer-family-binomial-output-with-manual-implementation-of-g and https://www.rdocumentation.org/packages/lme4/versions/1.1-19/topics/glmer

status.refit = function(formula, key) {
  new.mod = glmer(formula, data = dat, family = "binomial")
  print(anova(new.mod, status.mod, test="Chisq"))
  print(fixef(new.mod)[key])
}
addMetaphors = function(formula, refit) {
  refit(update(formula,  ~ . + no_metaphor), "no_metaphorTRUE")
  refit(update(formula,  ~ . + any_metaphor), "any_metaphorTRUE")
  refit(update(formula,  ~ . + both_metaphor), "both_metaphorTRUE")
  refit(update(formula,  ~ . + dom_journey), "dom_journeyTRUE")
  refit(update(formula,  ~ . + dom_journey + journey_prod), "journey_prod")
  refit(update(formula,  ~ . + dom_battle), "dom_battleTRUE")
  refit(update(formula,  ~ . + dom_battle + battle_prod), "battle_prod")
  refit(update(formula,  ~ . + only_battle), "only_battleTRUE")
  refit(update(formula,  ~ . + only_battle + battle_prod), "battle_prod")
  refit(update(formula,  ~ . + only_journey), "only_journeyTRUE")
  refit(update(formula,  ~ . + only_journey + journey_prod), "journey_prod")
  refit(update(formula,  ~ . + scale(battle_salience)), "scale(battle_salience)")
  refit(update(formula,  ~ . + scale(journey_salience)), "scale(journey_salience)")
}
addMetaphors(status.formula, status.refit)
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + no_metaphor
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    27 4012.6 4189.7 -1979.3   3958.6 1.9141      1     0.1665
## no_metaphorTRUE 
##      -0.1158132 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + any_metaphor
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    27 4012.6 4189.7 -1979.3   3958.6 1.9141      1     0.1665
## any_metaphorTRUE 
##        0.1158167 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + both_metaphor
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    27 4012.4 4189.5 -1979.2   3958.4 2.1239      1      0.145
## both_metaphorTRUE 
##          0.213417 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + dom_journey
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    27 4014.3 4191.4 -1980.2   3960.3 0.1995      1     0.6551
## dom_journeyTRUE 
##      0.07648559
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00126336
## (tol = 0.001, component 1)
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + dom_journey + journey_prod
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    28 4015.5 4199.2 -1979.8   3959.5 0.9787      2      0.613
## journey_prod 
##   0.04805398 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + dom_battle
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    27 4013.4 4190.6 -1979.7   3959.4 1.0756      1     0.2997
## dom_battleTRUE 
##     0.08622314 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + dom_battle + battle_prod
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    28 4013.5 4197.2 -1978.7   3957.5 3.0347      2     0.2193
## battle_prod 
##  0.07207074 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + only_battle
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    27 4014.3 4191.4 -1980.1   3960.3 0.2389      1      0.625
## only_battleTRUE 
##      0.04104324 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + only_battle + battle_prod
##            Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                        
## new.mod    28 4013.3 4197.0 -1978.7   3957.3 3.191      2     0.2028
## battle_prod 
##  0.07988441 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + only_journey
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    27 4014.5 4191.6 -1980.2   3960.5 0.0317      1     0.8587
## only_journeyTRUE 
##       0.03350789 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + only_journey + journey_prod
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    28 4015.5 4199.2 -1979.7   3959.5 1.0388      2     0.5949
## journey_prod 
##   0.04798922 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + scale(battle_salience)
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    27 4013.5 4190.7 -1979.8   3959.5 0.9706      1     0.3245
## scale(battle_salience) 
##             0.03905345 
## Data: dat
## Models:
## status.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## status.mod:     cancer_type + (1 | year)
## new.mod: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + 
## new.mod:     cancer_type + (1 | year) + scale(journey_salience)
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## status.mod 26 4012.5 4183.1 -1980.2   3960.5                         
## new.mod    27 4011.8 4188.9 -1978.9   3957.8 2.6928      1     0.1008
## scale(journey_salience) 
##              0.06218543
# inst.formula = status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + cancer_type + (1|year)
# inst.status.mod = glmer(inst.formula, data = dat[dat$only_journey, ], family = "binomial")
# inst.mod = glmer(update(inst.formula, ~ . + first_instantiation), data = dat[dat$only_journey, ], family = "binomial")
# anova(inst.mod, inst.status.mod, test="Chisq")
# fixef(inst.mod)["first_instantiation"]

Trying out a Bayesian regression for status. Not really developed yet and not fully Bayesian, but just playing around. Instead of comparing a bunch of models again, just took the structure of the fit glmer model and used that. Results are what one would expect but shed a little more light on how each varaible effects the model.

library(brms)
## Loading required package: Rcpp
## Loading 'brms' package (version 2.8.0). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
## 
## Attaching package: 'brms'
## The following object is masked from 'package:lme4':
## 
##     ngrps

Again, just reusing the glmer model for simplicity.

m = brm(status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + cancer_type + (1 | year), data = dat, family = "bernoulli", cores = 4)
## Compiling the C++ model
## Start sampling
## Warning: There were 2 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
summary(m)
## Warning: There were 2 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: status ~ shares_sc + updates_sc + photos_sc + goal_sc + duration_float_sc + cancer_type + (1 | year) 
##    Data: dat (Number of observations: 5223) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~year (Number of levels: 7) 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)     0.27      0.13     0.10     0.62       1236 1.01
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI
## Intercept                      -2.17      0.20    -2.58    -1.79
## shares_sc                       0.47      0.06     0.36     0.58
## updates_sc                     -0.39      0.07    -0.54    -0.25
## photos_sc                       0.18      0.05     0.08     0.27
## goal_sc                        -1.39      0.13    -1.65    -1.16
## duration_float_sc               0.16      0.08    -0.00     0.33
## cancer_typebonecancer           0.01      0.23    -0.45     0.46
## cancer_typebraincancer          0.32      0.24    -0.15     0.79
## cancer_typebreastcancer         0.17      0.20    -0.21     0.56
## cancer_typecervicalcancer      -1.09      0.67    -2.59     0.08
## cancer_typecoloncancer          0.20      0.42    -0.64     1.01
## cancer_typeesophagealcancer     0.24      0.25    -0.25     0.73
## cancer_typegeneral              0.28      0.19    -0.07     0.66
## cancer_typekidneycancer         0.27      0.21    -0.14     0.70
## cancer_typeleukemia             0.29      0.25    -0.21     0.76
## cancer_typelivercancer         -0.67      0.39    -1.46     0.05
## cancer_typelungcancer          -0.69      0.25    -1.19    -0.21
## cancer_typelymphoma             0.21      0.20    -0.17     0.60
## cancer_typemelanoma             0.31      0.26    -0.20     0.82
## cancer_typemixed                0.08      0.20    -0.30     0.47
## cancer_typeneuroblastoma        0.72      0.33     0.05     1.35
## cancer_typepancreaticcancer    -0.75      0.85    -2.69     0.71
## cancer_typeprostatecancer      -0.12      0.58    -1.31     0.95
## cancer_typeskincancer          -0.02      0.25    -0.52     0.47
## cancer_typetesticularcancer     0.81      0.32     0.16     1.44
##                             Eff.Sample Rhat
## Intercept                          925 1.00
## shares_sc                         3360 1.00
## updates_sc                        3286 1.00
## photos_sc                         3377 1.00
## goal_sc                           3302 1.00
## duration_float_sc                 2222 1.00
## cancer_typebonecancer             1275 1.00
## cancer_typebraincancer            1329 1.00
## cancer_typebreastcancer           1071 1.00
## cancer_typecervicalcancer         3072 1.00
## cancer_typecoloncancer            2611 1.00
## cancer_typeesophagealcancer       1544 1.00
## cancer_typegeneral                 990 1.00
## cancer_typekidneycancer           1228 1.00
## cancer_typeleukemia               1302 1.00
## cancer_typelivercancer            2045 1.00
## cancer_typelungcancer             1645 1.00
## cancer_typelymphoma               1086 1.00
## cancer_typemelanoma               1530 1.00
## cancer_typemixed                  1089 1.00
## cancer_typeneuroblastoma          1952 1.00
## cancer_typepancreaticcancer       3451 1.00
## cancer_typeprostatecancer         2820 1.00
## cancer_typeskincancer             1523 1.00
## cancer_typetesticularcancer       1839 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
marginal_effects(m)

Number of Backers

library(glmmTMB)
nrow(dat[dat$backers > 1200, ])
## [1] 25
ggplot() + labs(x="Number of Backers", title="Number of Backers Density") +
  geom_density(aes(dat$backers[dat$backers < 1200]))

We limit to 1200 because removing the outliers leaves us with a nicely shaped distribution.

dat.b <- dat[dat$backers < 1200, ]

Run a quick data dispersion test (see Rice 1995):

s = na.omit(dat.b)
pchisq(2 * sum(dat.b$backers * log(dat.b$backers / mean(dat.b$backers))), length(dat.b$backers) - 1, lower.tail = F)
## [1] 0

H0: The data are fit well by a Poisson Distribution H1: Poisson fails to fit the data well

The Poisson distribution obviously does not fit the data well since p approx 0. Let’s use a NegBin instead, which can account for differences in the mean and variance.

dat.b %>%
  ggplot(aes(goal, backers)) + labs(title="Goal Amount Distribution") +
  geom_point(aes(alpha=0.1)) +
  theme_minimal()

dat.b %>%
  ggplot(aes(duration_float, backers)) + labs(title="Duration Distribution") +
  geom_point(aes(alpha=0.1)) +
  theme_minimal()

dat.b %>%
  ggplot(aes(text_length_words, backers)) + labs(title="Text Length Distribution") +
  geom_point(aes(alpha=0.1)) +
  theme_minimal()

dat.b %>%
  ggplot(aes(photos, backers)) + labs(title="Photos Distribution") +
  geom_point(aes(alpha=0.1)) +
  theme_minimal()

dat.b %>%
  ggplot(aes(updates, backers)) + labs(title="Updates Distribution") +
  geom_point(aes(alpha=0.1)) + 
  theme_minimal()

dat.b %>%
  ggplot(aes(friends, backers)) + labs(title="FB Friends Distribution") +
  geom_point(aes(alpha=0.1)) + 
  theme_minimal()

dat.b %>%
  ggplot(aes(shares, backers)) + labs(title="FB Shares Distribution") +
  geom_point(aes(alpha=0.1)) + 
  theme_minimal()

dat.b %>%
  ggplot(aes(cancer_type, backers)) + labs(title="Cancer Types") +
  geom_boxplot() +
  theme_minimal() +
  theme(axis.text.x=element_text(angle = 60, hjust=1))

Model backers

It might be the case that a quasi-Poisson model fits better, see: https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1141&context=usdeptcommercepub

Negative binomial has a lower residual deviance than quasi-poisson. Therefore we use NegBin.

Inherently, the number of backers must be > 0, thus we model using a truncated NegBin fit.

backers.formula = backers ~ shares_sc + friends_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + duration_float_sc + cancer_type + month + day_of_week + (1|year)

backers.mod = glmmTMB(backers.formula, data = dat.b, family = "truncated_nbinom2")
drop1(backers.mod, test="Chisq")
## Single term deletions
## 
## Model:
## backers ~ shares_sc + friends_sc + updates_sc + photos_sc + goal_sc + 
##     text_length_words_sc + duration_float_sc + cancer_type + 
##     month + day_of_week + (1 | year)
##                      Df   AIC     LRT  Pr(>Chi)    
## <none>                  55273                      
## shares_sc             1 56381 1110.58 < 2.2e-16 ***
## friends_sc            1 55272    1.43   0.23147    
## updates_sc            1 55286   15.57 7.943e-05 ***
## photos_sc             1 55349   78.22 < 2.2e-16 ***
## goal_sc               1 55642  370.80 < 2.2e-16 ***
## text_length_words_sc  1 55320   48.94 2.640e-12 ***
## duration_float_sc     1 55275    4.33   0.03746 *  
## cancer_type          19 55614  379.46 < 2.2e-16 ***
## month                11 55268   17.25   0.10067    
## day_of_week           6 55264    3.44   0.75248    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
backers.formula = update(backers.formula,  ~ . - day_of_week)
drop1(glmmTMB(backers.formula, data = dat.b, family = "truncated_nbinom2"), test="Chisq")
## Single term deletions
## 
## Model:
## backers ~ shares_sc + friends_sc + updates_sc + photos_sc + goal_sc + 
##     text_length_words_sc + duration_float_sc + cancer_type + 
##     month + (1 | year)
##                      Df   AIC     LRT  Pr(>Chi)    
## <none>                  55264                      
## shares_sc             1 56373 1111.18 < 2.2e-16 ***
## friends_sc            1 55264    1.49   0.22157    
## updates_sc            1 55278   16.33 5.325e-05 ***
## photos_sc             1 55341   79.32 < 2.2e-16 ***
## goal_sc               1 55635  372.56 < 2.2e-16 ***
## text_length_words_sc  1 55311   48.56 3.209e-12 ***
## duration_float_sc     1 55267    4.38   0.03626 *  
## cancer_type          19 55607  380.42 < 2.2e-16 ***
## month                11 55260   17.40   0.09651 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
backers.formula = update(backers.formula,  ~ . - friends_sc)
drop1(glmmTMB(backers.formula, data = dat.b, family = "truncated_nbinom2"), test="Chisq")
## Single term deletions
## 
## Model:
## backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + 
##     duration_float_sc + cancer_type + month + (1 | year)
##                      Df   AIC     LRT  Pr(>Chi)    
## <none>                  55264                      
## shares_sc             1 56418 1156.62 < 2.2e-16 ***
## updates_sc            1 55279   16.98 3.783e-05 ***
## photos_sc             1 55341   79.76 < 2.2e-16 ***
## goal_sc               1 55633  371.14 < 2.2e-16 ***
## text_length_words_sc  1 55312   50.02 1.525e-12 ***
## duration_float_sc     1 55266    4.43   0.03539 *  
## cancer_type          19 55608  382.15 < 2.2e-16 ***
## month                11 55259   17.28   0.09986 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
backers.formula = update(backers.formula,  ~ . - month)
drop1(glmmTMB(backers.formula, data = dat.b, family = "truncated_nbinom2"), test="Chisq")
## Single term deletions
## 
## Model:
## backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + 
##     duration_float_sc + cancer_type + (1 | year)
##                      Df   AIC     LRT  Pr(>Chi)    
## <none>                  55259                      
## shares_sc             1 56408 1151.56 < 2.2e-16 ***
## updates_sc            1 55272   15.56 8.003e-05 ***
## photos_sc             1 55334   76.91 < 2.2e-16 ***
## goal_sc               1 55631  373.68 < 2.2e-16 ***
## text_length_words_sc  1 55308   50.68 1.085e-12 ***
## duration_float_sc     1 55261    3.90   0.04836 *  
## cancer_type          19 55602  381.07 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
backers.mod = glmmTMB(backers.formula, data = dat.b, family = "truncated_nbinom2")
summary(backers.mod)
##  Family: truncated_nbinom2  ( log )
## Formula:          
## backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc +  
##     duration_float_sc + cancer_type + (1 | year)
## Data: dat.b
## 
##      AIC      BIC   logLik deviance df.resid 
##  55258.9  55442.5 -27601.5  55202.9     5170 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  year   (Intercept) 0.005051 0.07107 
## Number of obs: 5198, groups:  year, 7
## 
## Overdispersion parameter for truncated_nbinom2 family (): 1.63 
## 
## Conditional model:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   4.21530    0.05243   80.39  < 2e-16 ***
## shares_sc                     0.78491    0.02604   30.14  < 2e-16 ***
## updates_sc                   -0.05591    0.01351   -4.14 3.48e-05 ***
## photos_sc                     0.12141    0.01494    8.12 4.50e-16 ***
## goal_sc                       0.31690    0.01802   17.59  < 2e-16 ***
## text_length_words_sc          0.08777    0.01278    6.87 6.41e-12 ***
## duration_float_sc             0.04963    0.02273    2.18  0.02900 *  
## cancer_typebone cancer        0.28294    0.06069    4.66 3.13e-06 ***
## cancer_typebrain cancer       0.30876    0.06591    4.68 2.80e-06 ***
## cancer_typebreast cancer      0.41010    0.05313    7.72 1.17e-14 ***
## cancer_typecervical cancer   -0.12754    0.12123   -1.05  0.29279    
## cancer_typecolon cancer       0.21059    0.11392    1.85  0.06454 .  
## cancer_typeesophageal cancer  0.35223    0.07089    4.97 6.75e-07 ***
## cancer_typegeneral            0.32523    0.05150    6.32 2.69e-10 ***
## cancer_typekidney cancer      0.24706    0.05992    4.12 3.74e-05 ***
## cancer_typeleukemia           0.54847    0.06900    7.95 1.88e-15 ***
## cancer_typeliver cancer      -0.28086    0.08206   -3.42  0.00062 ***
## cancer_typelung cancer       -0.27654    0.05692   -4.86 1.18e-06 ***
## cancer_typelymphoma           0.38982    0.05530    7.05 1.81e-12 ***
## cancer_typemelanoma          -0.02933    0.07412   -0.40  0.69235    
## cancer_typemixed              0.10175    0.05402    1.88  0.05961 .  
## cancer_typeneuroblastoma      0.46280    0.09532    4.86 1.20e-06 ***
## cancer_typepancreatic cancer  0.47055    0.14649    3.21  0.00132 ** 
## cancer_typeprostate cancer    0.05725    0.13385    0.43  0.66887    
## cancer_typeskin cancer       -0.05829    0.06805   -0.86  0.39169    
## cancer_typetesticular cancer  0.55171    0.10349    5.33 9.77e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Add metaphors

backers.refit = function(formula, key) {
  new.mod = glmmTMB(formula, data = dat.b, family = "truncated_nbinom2")
  print(anova(new.mod, backers.mod, test="Chisq"))
  print(fixef(new.mod)$cond[key])
}

addMetaphors(backers.formula, backers.refit)
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + no_metaphor, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## backers.mod 28 55259 55443 -27602    55203                             
## new.mod     29 55226 55417 -27584    55168 34.525      1  4.208e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## no_metaphorTRUE 
##      -0.1351643 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + any_metaphor, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## backers.mod 28 55259 55443 -27602    55203                             
## new.mod     29 55226 55417 -27584    55168 34.525      1  4.208e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## any_metaphorTRUE 
##        0.1351615 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + both_metaphor, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)  
## backers.mod 28 55259 55443 -27602    55203                           
## new.mod     29 55257 55448 -27600    55199 3.4421      1    0.06356 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## both_metaphorTRUE 
##        0.07510275 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + dom_journey, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)  
## backers.mod 28 55259 55443 -27602    55203                           
## new.mod     29 55257 55447 -27599    55199 4.3652      1    0.03668 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## dom_journeyTRUE 
##      0.09541535 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + dom_journey + , zi=~0, disp=~1
## new.mod:     journey_prod, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## backers.mod 28 55259 55443 -27602    55203                        
## new.mod     30 55258 55455 -27599    55198 4.441      2     0.1086
## journey_prod 
##   0.00413971 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + dom_battle, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## backers.mod 28 55259 55443 -27602    55203                             
## new.mod     29 55243 55433 -27593    55185 17.834      1  2.411e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## dom_battleTRUE 
##     0.09490447 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + dom_battle + , zi=~0, disp=~1
## new.mod:     battle_prod, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## backers.mod 28 55259 55443 -27602    55203                             
## new.mod     30 55241 55437 -27590    55181 22.372      2  1.387e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## battle_prod 
##  0.03180909 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + only_battle, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## backers.mod 28 55259 55443 -27602    55203                             
## new.mod     29 55244 55434 -27593    55186 16.856      1  4.033e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## only_battleTRUE 
##      0.09245179 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + only_battle + , zi=~0, disp=~1
## new.mod:     battle_prod, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## backers.mod 28 55259 55443 -27602    55203                             
## new.mod     30 55239 55436 -27590    55179 23.595      2  7.524e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## battle_prod 
##  0.03483742 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + only_journey, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
## backers.mod 28 55259 55443 -27602    55203                         
## new.mod     29 55259 55449 -27601    55201 1.6094      1     0.2046
## only_journeyTRUE 
##       0.06379286 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + only_journey + , zi=~0, disp=~1
## new.mod:     journey_prod, zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
## backers.mod 28 55259 55443 -27602    55203                         
## new.mod     30 55260 55457 -27600    55200 2.7695      2     0.2504
## journey_prod 
##   0.01413687 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + scale(battle_salience), zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## backers.mod 28 55259 55443 -27602    55203                             
## new.mod     29 55244 55434 -27593    55186 17.343      1   3.12e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## scale(battle_salience) 
##             0.04735853 
## Data: dat.b
## Models:
## backers.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## backers.mod:     duration_float_sc + cancer_type + (1 | year), zi=~0, disp=~1
## new.mod: backers ~ shares_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + , zi=~0, disp=~1
## new.mod:     duration_float_sc + cancer_type + (1 | year) + scale(journey_salience), zi=~0, disp=~1
##             Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)   
## backers.mod 28 55259 55443 -27602    55203                            
## new.mod     29 55252 55442 -27597    55194 8.8047      1   0.003005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## scale(journey_salience) 
##              0.03296808

OLD, IGNORE: On average, not having metaphors present lowers the expected value of log(backers) by 17.3%. When battle metaphors are dominant, the expected value of log(backers) increases by 13.1%. When battle metaphors are dominant, then an additional unit of battle productivity increases the expected value of log(backers) by 1.1%. When there are only battle metaphors present, we see an increase of 12.9%. Unit increases in battle salience lead to 7.2% increases in log(backers) expected value, and similarly, a unit increase in journey salience leads to a 3.8% increase.

Mean Donation

dat %>%
  ggplot() + labs(title="Mean Donation Density") +
  geom_density(aes(mean_donation))

dat.m = dat[dat$mean_donation < 500, ]

dat.m %>%
  ggplot() + labs(title="Mean Donation Density") +
  geom_density(aes(mean_donation))

dat.m %>%
  ggplot() + geom_qq(aes(sample=mean_donation+1), distribution = qexp) + geom_qq_line(aes(sample=mean_donation+1), distribution = qexp)

dat.m %>%
  ggplot(aes(goal, mean_donation)) + labs(title="Goal Amount Distribution") +
  geom_point(aes(alpha=0.1)) +
  theme_minimal()

dat.m %>%
  ggplot(aes(duration_float, mean_donation)) + labs(title="Duration Distribution") +
  geom_point(aes(alpha=0.1)) +
  theme_minimal()

dat.m %>%
  ggplot(aes(text_length_words, mean_donation)) + labs(title="Text Length Distribution") +
  geom_point(aes(alpha=0.1)) +
  theme_minimal()

dat.m %>%
  ggplot(aes(photos, mean_donation)) + labs(title="Photos Distribution") +
  geom_point(aes(alpha=0.1)) +
  theme_minimal()

dat.m %>%
  ggplot(aes(updates, mean_donation)) + labs(title="Updates Distribution") +
  geom_point(aes(alpha=0.1)) + 
  theme_minimal()

dat.m %>%
  ggplot(aes(friends, mean_donation)) + labs(title="FB Friends Distribution") +
  geom_point(aes(alpha=0.1)) + 
  theme_minimal()

dat.m %>%
  ggplot(aes(shares, mean_donation)) + labs(title="FB Shares Distribution") +
  geom_point(aes(alpha=0.1)) + 
  theme_minimal()

dat.m %>%
  ggplot(aes(cancer_type, mean_donation)) + labs(title="Cancer Types") +
  geom_boxplot() +
  theme_minimal() +
  theme(axis.text.x=element_text(angle = 60, hjust=1))

The data look like a Gamma, so we model with Gamma and a log link (see above answer for reason). It might be nice

Ideally, I could run a distributional test and confirm (or reject) this intuition.

Model mean donation

mean.formula = mean_donation ~ shares_sc + friends_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + duration_float_sc + cancer_type + month + day_of_week + (1|year)

mean.mod = glmer(mean.formula, data = dat.m, family = Gamma(link = "log"))
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0779305
## (tol = 0.001, component 1)
drop1(mean.mod, test="Chisq")
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0707713
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0603187
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0492307
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0681091
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0716196
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0470877
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0482949
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0364948
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.013786
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0487506
## (tol = 0.001, component 1)
## Single term deletions
## 
## Model:
## mean_donation ~ shares_sc + friends_sc + updates_sc + photos_sc + 
##     goal_sc + text_length_words_sc + duration_float_sc + cancer_type + 
##     month + day_of_week + (1 | year)
##                      Df   AIC     LRT   Pr(Chi)    
## <none>                  52853                      
## shares_sc             1 52910  58.920 1.643e-14 ***
## friends_sc            1 52903  52.754 3.780e-13 ***
## updates_sc            1 52865  14.697 0.0001263 ***
## photos_sc             1 52853   2.691 0.1009392    
## goal_sc               1 53105 254.324 < 2.2e-16 ***
## text_length_words_sc  1 52867  15.893 6.703e-05 ***
## duration_float_sc     1 52851   0.007 0.9348693    
## cancer_type          19 52889  74.587 1.564e-08 ***
## month                11 52850  19.666 0.0501381 .  
## day_of_week           6 52848   7.521 0.2753432    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mean.formula = update(mean.formula,  ~ . - duration_float_sc)
drop1(glmer(mean.formula, data = dat.m, family = Gamma(link = "log")), test="Chisq")
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0482949
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0446488
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0326906
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0576884
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0499333
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0567112
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0487179
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00549489
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00191435
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0455086
## (tol = 0.001, component 1)
## Single term deletions
## 
## Model:
## mean_donation ~ shares_sc + friends_sc + updates_sc + photos_sc + 
##     goal_sc + text_length_words_sc + cancer_type + month + day_of_week + 
##     (1 | year)
##                      Df   AIC     LRT   Pr(Chi)    
## <none>                  52851                      
## shares_sc             1 52908  58.917 1.644e-14 ***
## friends_sc            1 52901  52.744 3.801e-13 ***
## updates_sc            1 52864  14.909 0.0001128 ***
## photos_sc             1 52851   2.710 0.0997389 .  
## goal_sc               1 53103 254.322 < 2.2e-16 ***
## text_length_words_sc  1 52865  15.910 6.642e-05 ***
## cancer_type          19 52887  74.727 1.480e-08 ***
## month                11 52848  19.733 0.0491457 *  
## day_of_week           6 52846   7.507 0.2764528    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mean.formula = update(mean.formula,  ~ . - day_of_week)
drop1(glmer(mean.formula, data = dat.m, family = Gamma(link = "log")), test="Chisq")
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0455086
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0302281
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0338187
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0608278
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0273769
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0401886
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0353224
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00344377
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00150837
## (tol = 0.001, component 1)
## Single term deletions
## 
## Model:
## mean_donation ~ shares_sc + friends_sc + updates_sc + photos_sc + 
##     goal_sc + text_length_words_sc + cancer_type + month + (1 | 
##     year)
##                      Df   AIC     LRT   Pr(Chi)    
## <none>                  52846                      
## shares_sc             1 52904  59.653 1.131e-14 ***
## friends_sc            1 52896  52.149 5.144e-13 ***
## updates_sc            1 52859  15.118  0.000101 ***
## photos_sc             1 52847   2.937  0.086566 .  
## goal_sc               1 53097 253.317 < 2.2e-16 ***
## text_length_words_sc  1 52860  15.898 6.684e-05 ***
## cancer_type          19 52882  74.317 1.737e-08 ***
## month                11 52844  20.302  0.041362 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mean.formula = update(mean.formula,  ~ . - photos_sc)
drop1(glmer(mean.formula, data = dat.m, family = Gamma(link = "log")), test="Chisq")
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0273769
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0277174
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0486968
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0518222
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0411094
## (tol = 0.001, component 1)
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0159764
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00188271
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00122667
## (tol = 0.001, component 1)
## Single term deletions
## 
## Model:
## mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
##     text_length_words_sc + cancer_type + month + (1 | year)
##                      Df   AIC     LRT   Pr(Chi)    
## <none>                  52847                      
## shares_sc             1 52905  59.823 1.038e-14 ***
## friends_sc            1 52898  52.487 4.332e-13 ***
## updates_sc            1 52858  12.834 0.0003403 ***
## goal_sc               1 53096 250.465 < 2.2e-16 ***
## text_length_words_sc  1 52859  14.129 0.0001707 ***
## cancer_type          19 52884  75.208 1.227e-08 ***
## month                11 52845  19.609 0.0509983 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mean.formula = update(mean.formula,  ~ . - month)
drop1(glmer(mean.formula, data = dat.m, family = Gamma(link = "log")), test="Chisq")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00122667
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00140763
## (tol = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00128215
## (tol = 0.001, component 1)
## Single term deletions
## 
## Model:
## mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
##     text_length_words_sc + cancer_type + (1 | year)
##                      Df   AIC     LRT   Pr(Chi)    
## <none>                  52845                      
## shares_sc             1 52901  58.695 1.841e-14 ***
## friends_sc            1 52895  52.287 4.796e-13 ***
## updates_sc            1 52854  11.739 0.0006119 ***
## goal_sc               1 53091 248.482 < 2.2e-16 ***
## text_length_words_sc  1 52857  14.259 0.0001593 ***
## cancer_type          19 52882  75.322 1.174e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mean.mod = glmer(mean.formula, data = dat.m, family = Gamma(link = "log"))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00122667
## (tol = 0.001, component 1)
summary(mean.mod)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Gamma  ( log )
## Formula: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc +  
##     text_length_words_sc + cancer_type + (1 | year)
##    Data: dat.m
## 
##      AIC      BIC   logLik deviance df.resid 
##  52844.7  53021.8 -26395.3  52790.7     5185 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.9367 -0.6379 -0.1784  0.3914  9.8292 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  year     (Intercept) 0.001385 0.03722 
##  Residual             0.237275 0.48711 
## Number of obs: 5212, groups:  year, 7
## 
## Fixed effects:
##                               Estimate Std. Error t value Pr(>|z|)    
## (Intercept)                   4.557519   0.033615 135.581  < 2e-16 ***
## shares_sc                    -0.054571   0.006445  -8.467  < 2e-16 ***
## friends_sc                   -0.046001   0.006203  -7.416  1.2e-13 ***
## updates_sc                    0.024870   0.007371   3.374 0.000741 ***
## goal_sc                       0.117933   0.008025  14.696  < 2e-16 ***
## text_length_words_sc          0.024569   0.006597   3.725 0.000196 ***
## cancer_typebone cancer       -0.132080   0.034109  -3.872 0.000108 ***
## cancer_typebrain cancer      -0.047847   0.037123  -1.289 0.197439    
## cancer_typebreast cancer     -0.059081   0.029870  -1.978 0.047936 *  
## cancer_typecervical cancer   -0.142279   0.068179  -2.087 0.036901 *  
## cancer_typecolon cancer      -0.098172   0.064305  -1.527 0.126843    
## cancer_typeesophageal cancer  0.016594   0.040043   0.414 0.678585    
## cancer_typegeneral           -0.076316   0.028674  -2.661 0.007780 ** 
## cancer_typekidney cancer     -0.048947   0.033722  -1.451 0.146646    
## cancer_typeleukemia          -0.044285   0.038584  -1.148 0.251069    
## cancer_typeliver cancer      -0.041028   0.045938  -0.893 0.371794    
## cancer_typelung cancer        0.036893   0.031975   1.154 0.248580    
## cancer_typelymphoma          -0.101718   0.031182  -3.262 0.001106 ** 
## cancer_typemelanoma           0.033878   0.041833   0.810 0.418024    
## cancer_typemixed             -0.020183   0.030443  -0.663 0.507337    
## cancer_typeneuroblastoma     -0.063230   0.053805  -1.175 0.239927    
## cancer_typepancreatic cancer -0.028213   0.080955  -0.349 0.727462    
## cancer_typeprostate cancer    0.261529   0.075629   3.458 0.000544 ***
## cancer_typeskin cancer       -0.061921   0.038340  -1.615 0.106298    
## cancer_typetesticular cancer -0.122210   0.058088  -2.104 0.035389 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 25 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
## convergence code: 0
## Model failed to converge with max|grad| = 0.00122667 (tol = 0.001, component 1)

lmer produces AIC of 54710, with shares_sc + friends_sc + updates_sc + goal_sc + text_length_words_sc + cancer_type + (1 | year)

Add metaphors

mean.refit = function(formula, key) {
  new.mod = glmer(formula, data = dat.m, family = Gamma(link = "log"))
  print(anova(new.mod, mean.mod, test="Chisq"))
  print(fixef(new.mod)[key])
}

addMetaphors(mean.formula, mean.refit)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00109076
## (tol = 0.001, component 1)
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + no_metaphor
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## mean.mod 27 52845 53022 -26395    52791                             
## new.mod  28 52812 52996 -26378    52756 34.678      1  3.891e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## no_metaphorTRUE 
##     -0.07652165 
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + any_metaphor
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## mean.mod 27 52845 53022 -26395    52791                             
## new.mod  28 52812 52996 -26378    52756 34.678      1  3.891e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## any_metaphorTRUE 
##        0.0765246
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00120009
## (tol = 0.001, component 1)
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + both_metaphor
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
## mean.mod 27 52845 53022 -26395    52791                         
## new.mod  28 52846 53030 -26395    52790 0.7534      1     0.3854
## both_metaphorTRUE 
##        0.02007234 
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + dom_journey
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)  
## mean.mod 27 52845 53022 -26395    52791                           
## new.mod  28 52841 53025 -26392    52785 5.7629      1    0.01637 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## dom_journeyTRUE 
##      0.06257449 
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + dom_journey + 
## new.mod:     journey_prod
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)  
## mean.mod 27 52845 53022 -26395    52791                           
## new.mod  29 52840 53030 -26391    52782 8.4606      2    0.01455 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## journey_prod 
##   0.01379178 
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + dom_battle
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## mean.mod 27 52845 53022 -26395    52791                             
## new.mod  28 52826 53009 -26385    52770 20.854      1  4.957e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## dom_battleTRUE 
##     0.05810155 
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + dom_battle + 
## new.mod:     battle_prod
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## mean.mod 27 52845 53022 -26395    52791                             
## new.mod  29 52828 53018 -26385    52770 21.168      2  2.532e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##  battle_prod 
## -0.004645673 
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + only_battle
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## mean.mod 27 52845 53022 -26395    52791                             
## new.mod  28 52828 53012 -26386    52772 18.321      1  1.866e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## only_battleTRUE 
##      0.05446475 
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + only_battle + 
## new.mod:     battle_prod
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## mean.mod 27 52845 53022 -26395    52791                             
## new.mod  29 52830 53020 -26386    52772 18.415      2  0.0001003 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## battle_prod 
## 0.002301836 
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + only_journey
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)  
## mean.mod 27 52845 53022 -26395    52791                           
## new.mod  28 52842 53026 -26393    52786 4.7846      1    0.02871 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## only_journeyTRUE 
##       0.06238475
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00114157
## (tol = 0.001, component 1)
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + only_journey + 
## new.mod:     journey_prod
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)  
## mean.mod 27 52845 53022 -26395    52791                           
## new.mod  29 52840 53030 -26391    52782 8.7829      2    0.01238 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## journey_prod 
##   0.01463864 
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + scale(battle_salience)
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)  
## mean.mod 27 52845 53022 -26395    52791                           
## new.mod  28 52841 53024 -26392    52785 6.1139      1    0.01341 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## scale(battle_salience) 
##             0.01592889 
## Data: dat.m
## Models:
## mean.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## mean.mod:     text_length_words_sc + cancer_type + (1 | year)
## new.mod: mean_donation ~ shares_sc + friends_sc + updates_sc + goal_sc + 
## new.mod:     text_length_words_sc + cancer_type + (1 | year) + scale(journey_salience)
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)   
## mean.mod 27 52845 53022 -26395    52791                            
## new.mod  28 52836 53020 -26390    52780 10.294      1   0.001335 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## scale(journey_salience) 
##              0.01960493

Mean donation but with lmer

# mean2.formula = mean_donation+1 ~ shares_sc + friends_sc + updates_sc + photos_sc + goal_sc + text_length_words_sc + duration_float_sc + cancer_type + month + day_of_week + (1|year)
# 
# mean2.mod = lmer(formula, data = dat.m)
# 
# mean2.formula = update(mean2.formula,  ~ . - month)
# new.mod = lmer(mean2.formula, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# mean2.mod = new.mod
# 
# mean2.formula = update(mean2.formula,  ~ . - day_of_week)
# new.mod = lmer(mean2.formula, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# mean2.mod = new.mod
# 
# mean2.formula = update(mean2.formula,  ~ . - duration_float_sc)
# new.mod = lmer(mean2.formula, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# mean2.mod = new.mod
# 
# mean2.formula = update(mean2.formula,  ~ . - photos_sc)
# new.mod = lmer(mean2.formula, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# mean2.mod = new.mod
# summary(mean2.mod)
# AIC(mean2.mod)
# formula.temp = update(mean2.formula,  ~ . + no_metaphor)
# new.mod = lmer(formula.temp, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# fixef(new.mod)["no_metaphorTRUE"]
# 
# formula.temp = update(mean2.formula,  ~ . + any_metaphor)
# new.mod = lmer(formula.temp, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# fixef(new.mod)["any_metaphorTRUE"]
# 
# formula.temp = update(mean2.formula,  ~ . + dom_journey)
# new.mod = lmer(formula.temp, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# fixef(new.mod)["dom_journeyTRUE"]
# 
# formula.temp = update(mean2.formula,  ~ . + dom_journey + journey_prod)
# new.mod.prod = lmer(formula.temp, data = dat.m)
# anova(new.mod.prod, new.mod, test="Chisq")
# fixef(new.mod.prod)["journey_prod"]
# 
# formula.temp = update(mean2.formula,  ~ . + dom_battle)
# new.mod = lmer(formula.temp, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# fixef(new.mod)["dom_battleTRUE"]
# 
# formula.temp = update(mean2.formula,  ~ . + dom_battle + battle_prod)
# new.mod.prod = lmer(formula.temp, data = dat.m)
# anova(new.mod.prod, new.mod, test="Chisq")
# fixef(new.mod.prod)["battle_prod"]
# 
# formula.temp = update(mean2.formula,  ~ . + only_battle)
# new.mod = lmer(formula.temp, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# fixef(new.mod)["only_battleTRUE"]
# 
# formula.temp = update(mean2.formula,  ~ . + only_battle + battle_prod)
# new.mod.prod = lmer(formula.temp, data = dat.m)
# anova(new.mod.prod, new.mod, test="Chisq")
# fixef(new.mod.prod)["battle_prod"]
# 
# formula.temp = update(mean2.formula,  ~ . + only_journey)
# new.mod = lmer(formula.temp, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# fixef(new.mod)["only_journeyTRUE"]
# 
# formula.temp = update(mean2.formula,  ~ . + only_journey + journey_prod)
# new.mod.prod = lmer(formula.temp, data = dat.m)
# anova(new.mod.prod, new.mod, test="Chisq")
# fixef(new.mod.prod)["journey_prod"]
# 
# formula.temp = update(mean2.formula,  ~ . + scale(battle_salience))
# new.mod = lmer(formula.temp, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# fixef(new.mod)["scale(battle_salience)"]
# 
# formula.temp = update(mean2.formula,  ~ . + scale(journey_salience))
# new.mod = lmer(formula.temp, data = dat.m)
# anova(new.mod, mean2.mod, test="Chisq")
# fixef(new.mod)["scale(journey_salience)"]